1,467 research outputs found

    Calcified amorphous tumor of left atrium

    Get PDF

    Improving protein secondary structure prediction based on short subsequences with local structure similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>When characterizing the structural topology of proteins, protein secondary structure (PSS) plays an important role in analyzing and modeling protein structures because it represents the local conformation of amino acids into regular structures. Although PSS prediction has been studied for decades, the prediction accuracy reaches a bottleneck at around 80%, and further improvement is very difficult.</p> <p>Results</p> <p>In this paper, we present an improved dictionary-based PSS prediction method called SymPred, and a meta-predictor called SymPsiPred. We adopt the concept behind natural language processing techniques and propose synonymous words to capture local sequence similarities in a group of similar proteins. A synonymous word is an <it>n-</it>gram pattern of amino acids that reflects the sequence variation in a protein’s evolution. We generate a protein-dependent synonymous dictionary from a set of protein sequences for PSS prediction.</p> <p>On a large non-redundant dataset of 8,297 protein chains (<it>DsspNr-25</it>), the average <it>Q</it><sub>3</sub> of SymPred and SymPsiPred are 81.0% and 83.9% respectively. On the two latest independent test sets (<it>EVA Set_1</it> and <it>EVA_Set2</it>), the average <it>Q</it><sub>3</sub> of SymPred is 78.8% and 79.2% respectively. SymPred outperforms other existing methods by 1.4% to 5.4%. We study two factors that may affect the performance of SymPred and find that it is very sensitive to the number of proteins of both known and unknown structures. This finding implies that SymPred and SymPsiPred have the potential to achieve higher accuracy as the number of protein sequences in the NCBInr and PDB databases increases.</p> <p>Conclusions</p> <p>Our experiment results show that local similarities in protein sequences typically exhibit conserved structures, which can be used to improve the accuracy of secondary structure prediction. For the application of synonymous words, we demonstrate an example of a sequence alignment which is generated by the distribution of shared synonymous words of a pair of protein sequences. We can align the two sequences nearly perfectly which are very dissimilar at the sequence level but very similar at the structural level. The SymPred and SymPsiPred prediction servers are available at <url>http://bio-cluster.iis.sinica.edu.tw/SymPred/</url>.</p

    The Multi-Q web server for multiplexed protein quantitation

    Get PDF
    The Multi-Q web server provides an automated data analysis tool for multiplexed protein quantitation based on the iTRAQ labeling method. The web server is designed as a platform that can accommodate various input data formats from search engines and mass spectrometer manufacturers. Compared to the previous stand-alone version, the new web server version provides many enhanced features and flexible options for quantitation. The workflow of the web server is represented by a quantitation wizard so that the tool is easy to use. It also provides a friendly interface that helps users configure their parameter settings before running the program. The web server generates a standard report for quantitation results. In addition, it allows users to customize their output reports and information of interest can be easily highlighted. The output also provides visualization of mass spectral data so that users can conveniently validate the results. The Multi-Q web server is a fully automated and easy to use quantitation tool that is suitable for large-scale multiplexed protein quantitation. Users can download the Multi-Q Web Server from http://ms.iis.sinica.edu.tw/Multi-Q-Web

    NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition

    Get PDF
    BACKGROUND: Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing. RESULTS: To develop our ML-based Bio-NER system, we employ conditional random fields, which have performed effectively in several well-known tasks, as our underlying ML model. Adding selected conjunction features, applying numerical normalization, and employing pattern-based post-processing improve the F-scores by 1.67%, 1.04%, and 0.57%, respectively. The combined increase of 3.28% yields a total score of 72.98%, which is better than the baseline system that only uses singleton features. CONCLUSION: We demonstrate the benefits of using the sequential forward search algorithm to select effective conjunction feature groups. In addition, we show that numerical normalization can effectively reduce the number of redundant and unseen features. Furthermore, the Smith-Waterman local alignment algorithm can help ML-based Bio-NER deal with difficult cases that need longer context windows

    Protein subcellular localization prediction of eukaryotes using a knowledge-based approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles.</p> <p>Results</p> <p>In this study, we propose a knowledge based method, called KnowPred<sub>site</sub>, to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred<sub>site</sub>'s performance. The experiment results show that KnowPred<sub>site </sub>achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred<sub>site </sub>is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred<sub>site </sub>is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred<sub>site</sub>.</p> <p>Conclusion</p> <p>KnowPred<sub>site </sub>demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred<sub>site </sub>is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred<sub>site </sub>is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred<sub>site </sub>prediction server is available at <url>http://bio-cluster.iis.sinica.edu.tw/kbloc/</url>.</p

    Hypolipidemic Effects of Three Purgative Decoctions

    Get PDF
    In traditional Chinese medicine (TCM), purgation is indicated when a person suffers an illness due to the accumulation of evil internal heat. Obese individuals with a large belly, red face, thick and yellow tongue fur, constipation, and avoidance of heat are thought accumulates of evil internal heat, and they are also treated with purgatives such as Ta-Cheng-Chi-Tang (TCCT), Xiao-Chen-Chi-Tang (XCCT), and Tiao-Wei-Chen-Chi-Tang (TWCCT) by TCM doctors. In previous studies, our group found that TCCT has potent anti-inflammatory activity, and that XCCT is an effective antioxidant. Since rhubarb is the principle herb in these three prescriptions, we will first present a thorough review of the literature on the demonstrated effect (or lack of effect) of rhubarb and rhubarb-containing polyherbal preparations on lipid and weight control. We will then continue our research with an investigation of the anti-obesity and lipid-lowering effect of TCCT, XCCT, TWCCT, and rhubarb extracts using two animal models. TWCCT lowered the serum triglyceride concentration as much as fenofibrate in Triton WR-1339-treated mice. Daily supplementation with XCCT and TWCCT significantly attenuated the high-fat-diet-induced hypercholesterolemia in rats. In addition, TWCCT also significantly lowered the high-fat-diet-induced hypertriglycemia. Although feeding high-fat diet rats with these extracts did not cause loose stools or diarrhea or other deleterious effects on renal or hepatic function. None of these extracts lowered the body weight of rats fed on high-fat diet. In conclusion, the results suggest that XCCT and TWCCT might exert beneficial effects in the treatment of hyperlipidemia

    Protein subcellular localization prediction based on compartment-specific features and structure conservation

    Get PDF
    BACKGROUND: Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. RESULTS: We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%. CONCLUSION: Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes
    corecore